Identifying Multi-instance Outliers

نویسندگان

  • Ou Wu
  • Jun Gao
  • Weiming Hu
  • Bing Li
  • Mingliang Zhu
چکیده

This paper studies a new data mining problem called multiinstance outlier identification. This problem arises in tasks where each sample consists of many alternative feature vectors (instances) that describe it. This paper defines the multi-instance outliers and analyzes the basic types of multiinstance outliers. Two general identification approaches are proposed based on the state-of-the-art (single-instance) outlier detector LOF (local outlier factor). One approach utilizes the underlying mechanism of the kernel method and plunges the set distance into LOF to detect the multiinstance outliers. The other approach takes each instance’s neighborhood into account. Based on the two approaches, four concrete multi-instance outlier detectors are then introduced. We conduct experiments over four synthetic data collections and three real-world data collections (two Musk data sets [22, 23] and a hard-drive inspection data set [24]). The experimental results show that the proposed multi-instance outlier detectors are effective while the algorithms that ignore the multi-instance settings perform poorly. Especially, the results on the two Musk sets are consistent with the multi-instance learning results; the results on the hard-drive inspection data set demonstrate that multi-instance outlier identification is promising for real applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simultaneous robust estimation of multi-response surfaces in the presence of outliers

A robust approach should be considered when estimating regression coefficients in multi-response problems. Many models are derived from the least squares method. Because the presence of outlier data is unavoidable in most real cases and because the least squares method is sensitive to these types of points, robust regression approaches appear to be a more reliable and suitable method for addres...

متن کامل

TRASMIL: A local anomaly detection framework based on trajectory segmentation and multi-instance learning

Local anomaly detection refers to detecting small anomalies or outliers that exist in some subsegments of events or behaviors. Such local anomalies are easily overlooked by most of the existing approaches since they are designed for detecting global or large anomalies. In this paper, an accurate and flexible threephase framework TRASMIL is proposed for local anomaly detection based on TRAjector...

متن کامل

Learning Instance Specific Distance for Multi-Instance Classification

Multi-Instance Learning (MIL) deals with problems where each training example is a bag, and each bag contains a set of instances. Multi-instance representation is useful in many real world applications, because it is able to capture more structural information than traditional flat single-instance representation. However, it also brings new challenges. Specifically, the distance between data ob...

متن کامل

Search for outliers in abnormal data

The most popular methods for identifying outliers are basing on the assumption that the underlying generative model is multi-variate Gaussian with a given set of parameters. However, generally, real observed data are abnormal, thus inference based on the normality assumption may provide very inaccurate results and should not be trusted. I show the search for outliers in two real abnormal data s...

متن کامل

Detecting Unusual Input-Output Associations in Multivariate Conditional Data

Despite tremendous progress in outlier detection research in recent years, the majority of existing methods are designed only to detect unconditional outliers that correspond to unusual data patterns expressed in the joint space of all data attributes. Such methods are not applicable when we seek to detect conditional outliers that reflect unusual responses associated with a given context or co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010